Skip to content

[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture#32240

Merged
chaunceyjiang merged 12 commits intovllm-project:mainfrom
chaunceyjiang:vllm_open_refactor
Jan 13, 2026
Merged

[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture#32240
chaunceyjiang merged 12 commits intovllm-project:mainfrom
chaunceyjiang:vllm_open_refactor

Conversation

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang commented Jan 13, 2026

Purpose

refactors the OpenAI chat_completion_serving architecture,

split vllm/entrypoints/openai/protocol.py
TODO
[ ] completion_serving
[ ] responses_serving
[ ] transcription_serving
[ ] tests re-org
[ ] compatibility with the previous import of vllm/entrypoints/openai/protocol.py

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@mergify mergify bot added deepseek Related to DeepSeek models frontend llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models labels Jan 13, 2026
@chaunceyjiang chaunceyjiang changed the title [Refactor] [6/N] to simplify the vLLM openai serving architecture [Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture Jan 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the OpenAI serving architecture by restructuring files and updating import paths. The changes are mostly mechanical, but I found a couple of critical issues in the newly added vllm/entrypoints/openai/chat_completion/protocol.py file: a syntax error in an import statement and a missing import for FunctionDefinition. These issues will prevent the code from running and need to be addressed.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Jan 13, 2026
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 13, 2026
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as tests pass

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Jan 13, 2026
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) January 13, 2026 11:14
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) January 13, 2026 11:18
@chaunceyjiang chaunceyjiang merged commit fefce49 into vllm-project:main Jan 13, 2026
50 checks passed
@chaunceyjiang chaunceyjiang deleted the vllm_open_refactor branch January 13, 2026 13:06
sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026
…architecture (vllm-project#32240)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
…architecture (vllm-project#32240)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…architecture (vllm-project#32240)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…architecture (vllm-project#32240)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
shun-cohere added a commit to cohere-ai/melody that referenced this pull request Apr 14, 2026
- Branch OpenAI entrypoint imports on vllm version: vllm > 0.14.1 uses
  the reorganized paths introduced in vllm-project/vllm#32240
- Add ty: ignore[unresolved-import] suppressions for version-gated
  imports that may not exist in the installed vllm
- Matrix py-check CI job across vllm 0.14.1 and 0.15.1
- Fix is_reasoning_end signature: list[int] -> Sequence[int] to match
  the abstract base class
walterbm-cohere pushed a commit to cohere-ai/melody that referenced this pull request Apr 14, 2026
## Description

This PR aims to support vLLM v0.15.1 or newer verions.
To do this, we introduce a conditional import logic at the top of
`parser.py`


## Related Issue

<!--- Please link to the Github or Linear issue here -->

vllm-project/vllm#32240 introduces new
structure, which is a breaking change for melody

## Motivation and Context

<!--- Why is this change required? What problem does it solve? -->

Melody does not support vLLM v0.15 or newer verions.

## How Has This Been Tested?

<!--- Please describe how you tested your changes -->
<!--- Add new tests to validate the functionality if possible -->

### Check1: Confirm that import works with both vLLM versions

```
$ uv pip list | grep vllm
vllm                              0.15.1
$ uv run cohere_melody_vllm/parser.py
# no error
```

```
$ uv pip list | grep vllm
vllm                              0.14.1
$ uv run cohere_melody_vllm/parser.py
# no error
```


### Check2: Tool Calling works with vLLM v0.15.1

Start server

```
uv run vllm serve CohereLabs/c4ai-command-r7b-12-2024 --reasoning-parser cohere2 --reasoning-parser-plugin ./cohere_melody_vllm/parser.py --tool-parser-plugin ./cohere_melody_vllm/parser.py --tool-call-parser cohere2 --enable-auto-tool-choice
```

and then send a tool calling query

```
$ uv run tool.py
ChatCompletion(id='chatcmpl-a8a2b2e52a4dc558', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-a9f3c1ba457d277f', function=Function(arguments='{"location": "San Francisco, California", "unit": "celsius"}', name='get_weather'), type='function')], reasoning='I will use the get_weather tool to find out the weather in San Francisco, California in Celsius.', reasoning_content='I will use the get_weather tool to find out the weather in San Francisco, California in Celsius.'), stop_reason=None, token_ids=None)], created=1776133545, model='CohereLabs/c4ai-command-r7b-12-2024', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=69, prompt_tokens=1302, total_tokens=1371, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
'Function called: get_weather'
'Arguments: {"location": "San Francisco, California", "unit": "celsius"}'
'Result: Getting the weather for San Francisco, California in celsius...'
```

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds version-dependent imports for vLLM OpenAI protocol types, so a
mistake in version detection or module paths could cause runtime import
failures across supported vLLM versions.
> 
> **Overview**
> Adds vLLM version-aware import logic in `cohere_melody_vllm/parser.py`
to handle the OpenAI entrypoint protocol module reorganization
introduced after vLLM `0.14.1`, enabling the plugin to run against both
old and new layouts.
> 
> Updates the Python bindings CI `py-check` job to run `ty check` in a
matrix against vLLM `0.14.1` and `0.15.1` to continuously validate
compatibility.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
e3aa4f8. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models frontend gpt-oss Related to GPT-OSS models llama Related to Llama models multi-modality Related to multi-modality (#4194) qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed tool-calling v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants